Chapter 5

Measures of Dispersion

What is Dispersion as a Concept?

What is Dispersion within Statistics?

It is the amount of spread or variability among raw scores in a distribution.

Kurtosis - the Cousin of Skewness

The ‘sharpness’ of the peak of a frequency-distribution curve.


The ‘sharpness’ of the peak occurs due to the tailedness (i.e., how often outliers occur) with the distribution.

Types of Kurtosis

Leptokurtosis

data clustered around the mean and fewer data points in the tails compared to a normal distribution

Mesokurtosis

data closely resembles that of a normal distribution

Platykurtosis

data points are more spread out, with fewer values clustered around the mean and more values in the tails

Measures of Dispersion for Categorical Variables

The Variation Ratio


  • A measure of dispersion.

  • The proportion of cases which are not in the mode category.

  • The only measure of dispersion that can be used with categorical variables

The Variation Ratio

\[v = 1 - \frac{fm}{n}\]


\(fm\) = the frequency (number of cases) of the mode

\(n\) = sample size

Varition Ratio Example

We asked 1,984 individuals at the University what their favorite color was. We were left with four colors: red, orange, green, and blue. What is the variation score?


       colors
Red        43
Orange    211
Green     341
Blue     1389

\[v = 1 - \frac{fm}{n}\]

Recap

Dispersion


Kurtosis

  • Leptokurtosis

  • Mesokurtosis

  • Platykurtosis

The Variation Ratio

Measures of Dispersion for Continious Variables

Range


Variance


Standard Deviation

Range

A measure of the span of data.


A high range value indicates there is more dispersion, and the lower the range, the less the dispersion


\[Range = The \, Maximum \, Value - The \, Minimum \, Value\]

Range Example

Below is a data frame that contains the average salary within each district. Find the range.


          Salary
District1  87000
District2  91000
District3  66500
District4  98500
District5  96500
District6  97550
District7  97900
District8  97990

\[?\]

Variance

A measure of how spread the observed values are from the mean.

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]

\(s^2\) = sample variance
\(x_i\) = raw value
\(\bar{x}\) = mean values of all raw scores
\(n\) = number of observations

Break Down of Equation

  1. Find the \(\bar{x}\)

  2. Subtract the mean from each score \((x_i - \bar{x})\)

  3. Square the deviation score \((x_i - \bar{x})^2\)

  4. Add the squared deviations \({\sum(x_i - \bar{x})^2}\)

  5. Divide by \({n-1}\)

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]

Why n-1?

Suppose we draw n independent observations from a population (N), with a unknown population mean \((\mu)\) and unknown variance \((s^2)\).

Ideally we would use \((s^2)\) to find the average squared distance from the true mean,

\[s^2 = \frac{\sum(x_i - \mu)^2}{n}\]

Although we can’t! 😢

Because we don’t know our \((\mu)\).

We Sample it!

Since we don’t know \((\mu)\), we use our best estimate of it which is the sample mean \(\bar{x}\).

So let’s pull out the population variable \((\mu)\) and plug in our [sample variables (\(\bar{x}\)).

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n}\]

Although another small problems pops up!

The Problem (Population)


Population and the true \({\mu}\)


\[(-5,0)*--*-*-*----(0,0)--\stackrel{\mu}{|}--*--*--*--- (5,0)\]

The Problem (Sample)


Sample and the \(\bar{x}\)


\[(-5,0)*--*-*-----(0,0)--\stackrel{\mu}{|}--------- (5,0)\]


The sample mean can tend to underestimate or even overestimate the true \({\mu}\)

The Solution

We modify \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n}\) to \(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)


And this provides an unbiased estimator of the population variance when using a sample.

In Sum

When we are measuring a population, divide by n. (hint: There will be a \(\mu\) in the deviation score.)


When we are measuring a sample, divide by n-1. (hint: There will be a \(\bar{x}\) in the deviation score.)

Variance Example 1

1) Find the \(\bar{x}\)

2) Subtract the mean from each score \((x_i - \bar{x})\)

3) Square the deviation score \((x_i - \bar{x})^2\)

4) Add the squared deviations \({\sum(x_i - \bar{x})^2}\)

5) Divide by \({n-1}\)

   District Salary
1 District1    832
2 District2    931
3 District3   1468
4 District4   1021
5 District5   1039
6 District6   1515
7 District7   1138
8 District8    620

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]

Variance Example 2

1) Find the \(\bar{x}\)

2) Subtract the mean from each score \((x_i - \bar{x})\)

3) Square the deviation score \((x_i - \bar{x})^2\)

4) Add the squared deviations \({\sum(x_i - \bar{x})^2}\)

5) Divide by \({n-1}\)

   District Salary
1 District1    983
2 District2    993
3 District3   1047
4 District4   1002
5 District5   1004
6 District6   1051
7 District7   1014
8 District8    962

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\]

Standard Deviation

A measure of how far observed values are from the mean.

It’s simply the square root of our variance!

\[\sigma=\sqrt{\frac{\sum_{} (x_{i} - \bar{x})^2}{n-1}}\] \(\sigma\) = standard deviation
\(x_i\) = raw value
\(\bar{x}\) = mean values of all raw scores
\(n\) = number of observations

Why square root it?

In our variance equation, we squared the sum of the deviation score \((x_i - \bar{x})\) to get rid of our negative values.

\[s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\] But by doing that we created an output that is nonsensical to our data and thus our interpretation.

\[\sigma=\sqrt{\frac{\sum_{} (x_{i} - \bar{x})^2}{n-1}}\]

So by squaring it, we normalize the sum of the deviation score, and thus the values make sense.

Standard Deviation Example

Take our calculation from variance example 1 and square it.

   District Salary
1 District1    832
2 District2    931
3 District3   1468
4 District4   1021
5 District5   1039
6 District6   1515
7 District7   1138
8 District8    620

\[\sigma=\sqrt{\frac{\sum_{} (x_{i} - \bar{x})^2}{n-1}}\]

Vizualize a Standard Deviation

Vizualize a Standard Deviation (Large)

Vizualize a Standard Deviation (Small)

The Normal Curve

On average, 68% of the sample will fall within 1 standard deviation of the mean, 95% at 2 standard deviations, and 99.7% will fall within 3 standard deviations.

Have Wonderful Day!